[MLOB-7510] added session-level eval documentation#36958
Conversation
Preview links (active after the
|
|
Created DOCS-14508 for editorial review. |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed some formatting changes, as well as a significant amount of section reorganization, most notably:
|
cswatt
left a comment
There was a problem hiding this comment.
Please review the changes I've made, since the structure of the page has been significantly altered. If you absolutely need to add back any of the justification material I've removed, we can discuss!
Also noted a seeming discrepancy in the configuration instructions
| text: "Tracking user sessions" | ||
| --- | ||
|
|
||
| A session-level evaluation runs once per [user session][9], with every trace—and every span in those traces—available to the LLM judge in a single prompt. Sessions group related interactions under a shared `session_id` (for example, a chat conversation) and can include multiple traces over an extended interaction. |
There was a problem hiding this comment.
he opening leads with mechanics ("runs once per user session") before the reader knows what the feature does for them.
Suggestion - something like
A session-level evaluation runs a custom LLM-as-a-judge across an entire [user session],
every trace, and every span in those traces, in a single prompt. Use it to score things that
only make sense across a whole interaction: whether the user's goal was met, whether the
assistant stayed coherent across turns, or whether a user grew frustrated over time.
Sessions group related interactions under a shared session_id (for example, a chat
conversation) and can span multiple traces. Session scope sees context that trace- and
span-level judges cannot, because those judges only see a single request or span.
|
|
||
| {{< img src="llm_observability/evaluations/session_level_evaluation_scope.png" alt="The Evaluate On scope picker with Session selected." style="width:100%;" >}} | ||
|
|
||
| <div class="alert alert-info">A session is considered complete after 30 minutes of inactivity (no new spans for that session, measured from the most recent span), at which point the evaluation runs. Spans that arrive more than 30 minutes after the previous span are not included in the evaluation.</div> |
There was a problem hiding this comment.
We dont need this, there is a whole section for it.
|
|
||
| 1. Pick a sample session from the panel on the right. The pane lists the traces in that session, with the fields referenced by your prompt highlighted. | ||
|
|
||
| {{< img src="llm_observability/evaluations/session_level_sample_session.png" alt="The configuration page in session scope, with the sample session pane on the right showing traces and highlighted span fields." style="width:100%;" >}} |
There was a problem hiding this comment.
suggest to remove. The image below is enough
|
|
||
| {{< img src="llm_observability/evaluations/session_level_sample_session.png" alt="The configuration page in session scope, with the sample session pane on the right showing traces and highlighted span fields." style="width:100%;" >}} | ||
|
|
||
| Clicking on a session then lists the traces in that session, with the fields referenced by your prompt highlighted. |
There was a problem hiding this comment.
Suggest to remove this text.
|
Also -
|
What does this PR do? What is the motivation?
This PR adds documentation of session level evals
Merge instructions
Merge readiness:
For Datadog employees:
Your branch name MUST follow the
<name>/<description>convention and include the forward slash (/). Without this format, your pull request will not pass CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.If your branch doesn't follow this format, rename it or create a new branch and PR.
[6/5/2025] Merge queue has been disabled on the documentation repo. If you have write access to the repo, the PR has been reviewed by a Documentation team member, and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #documentation channel in Slack.
AI assistance
Additional notes